Abstract

I will be trying to find accelerating/decelerating dealer usage.

Load Data & Libraries

library(tidyverse)
library(ggplot2)
library(lubridate)
library(plotly)
data <- read.csv("20230921 appraisal activity.csv")
head(data)
##           create_date               vin        odometer dealer_id
## 1 2023-09-21 22:26:21 5FNRL5H97GB107752  46663.00000000     31233
## 2 2023-09-21 22:26:16 KM8J3CAL4MU408744  35644.00000000      1824
## 3 2023-09-21 22:26:07 1C4RDJDG3MC642161  60000.00000000      4562
## 4 2023-09-21 22:26:06 JTMRFREV5FD135295 130847.00000000     30295
## 5 2023-09-21 22:26:04 JM1NC25F570132039  34234.00000000       153
## 6 2023-09-21 22:26:01 5FNYF6H50LB051105  18784.00000000       218
##                       dealer_name dealer_contact_province dealer_contact_postal
## 1 Jim Falk Lexus of Beverly Hills                      CA                 90212
## 2          UNIQUE AUTO SALES, LLC                      GA                 31004
## 3                    Tesla Motors                      BC               V7T 2Z3
## 4         Woodhouse Auto Exchange                      NE                 68118
## 5                      John Wolfe                      TX                 76180
## 6              Galves Market Data                      NJ                 07608
##   country source_type ph_year ph_manuf            ph_model         ph_body
## 1       1           6    2016    HONDA     ODYSSEY TOURING 4 DOOR WAGON 7P
## 2       1           1    2021  HYUNDAI        TUCSON SPORT      4 DOOR SUV
## 3       2           1    2021    DODGE          DURANGO GT      4 DOOR SUV
## 4       1           6    2015   TOYOTA            RAV4 XLE      4 DOOR SUV
## 5       1           1    2007    MAZDA MIATA GRAND TOURING     CONVERTIBLE
## 6       1           1    2020    HONDA      PILOT EX-L AWD      4 DOOR SUV
##   ph_engtype trade market           source
## 1    3.5L V6 19875  21225 sirius appraisal
## 2 2.4L 4 CYL 19550  20950 sirius appraisal
## 3    3.6L V6 35150  37225 sirius appraisal
## 4 2.5L 4 CYL  8050   9250   Dealer Website
## 5 2.0L 4 CYL  9400  10450 sirius appraisal
## 6    3.5L V6 30475  32025 sirius appraisal

Exploratory Data Analysis

Let’s take a deeper look into the data and see which variables will be useful in helping us see which dealers are increasing or decreasing usage of the tool.

colnames(data)
##  [1] "create_date"             "vin"                    
##  [3] "odometer"                "dealer_id"              
##  [5] "dealer_name"             "dealer_contact_province"
##  [7] "dealer_contact_postal"   "country"                
##  [9] "source_type"             "ph_year"                
## [11] "ph_manuf"                "ph_model"               
## [13] "ph_body"                 "ph_engtype"             
## [15] "trade"                   "market"                 
## [17] "source"
which(rowSums(is.na(data)) > 0)
## [1] 2196912

create_date: useful to gather a time period of data input.
dealer_name: so we know which dealer is which.
dealer_id: same use

This should be enough to find out which dealers are accelerating and decelerating usage. We also notice that a single row, 2196912 is NA. we will remove this row.

print(data[2196912,])
##         create_date vin odometer dealer_id dealer_name dealer_contact_province
## 2196912    2023-04-                     NA                                    
##         dealer_contact_postal country source_type ph_year ph_manuf ph_model
## 2196912                            NA          NA      NA                  
##         ph_body ph_engtype trade market source
## 2196912                       NA     NA
data <- na.omit(data) 

We have removed the missing row.

Wrangle the data

Let’s select our desired cols, and convert to proper type.

dealer <- data |> 
  select(create_date, dealer_id, dealer_name) |> # select wanted cols
  mutate(create_date = as.Date(create_date)) # convert to date format

Great, now we can parse each date to weekly intervals. By R defaults, weeks start on Monday.

week_frame <- dealer |> 
  mutate(week = week(create_date))

Let’s take a look at what we have so far.

head(week_frame)
##   create_date dealer_id                     dealer_name week
## 1  2023-09-21     31233 Jim Falk Lexus of Beverly Hills   38
## 2  2023-09-21      1824          UNIQUE AUTO SALES, LLC   38
## 3  2023-09-21      4562                    Tesla Motors   38
## 4  2023-09-21     30295         Woodhouse Auto Exchange   38
## 5  2023-09-21       153                      John Wolfe   38
## 6  2023-09-21       218              Galves Market Data   38

Now that everything is nice and formatted, we can group by week and dealer to give a count of how many times each dealer used the tool that week.

dealer_grouped <- week_frame |> 
  group_by(week, dealer_id, dealer_name) |> 
  summarise(weekly_count = n())

head(dealer_grouped)
## # A tibble: 6 × 4
## # Groups:   week, dealer_id [6]
##    week dealer_id dealer_name              weekly_count
##   <dbl>     <int> <chr>                           <int>
## 1    14        17 Robinson GMC                       25
## 2    14        18 Accu-Trade                          3
## 3    14        23 R Hollenshead Auto Sales         1510
## 4    14        70 Great Northern Auction             28
## 5    14        82 Towne Mazda                         4
## 6    14       153 John Wolfe                      10109

Week 14 is the start week. Data begins in April, the 14th week of the year.

Computing Weekly Change

Visualizing 5000+ different dealers can be tough. What we can do is find out n dealers who have had the largest differences in usage from the beginning to the end of the time period. First we need to re-arrange our data a tad bit.

usage_data <- dealer_grouped |> 
  arrange(dealer_id, week) |> 
  group_by(dealer_id)

head(usage_data)
## # A tibble: 6 × 4
## # Groups:   dealer_id [2]
##    week dealer_id dealer_name                weekly_count
##   <dbl>     <int> <chr>                             <int>
## 1    31        15 Miller Hughes Ford Lincoln            1
## 2    14        17 Robinson GMC                         25
## 3    15        17 Robinson GMC                         44
## 4    16        17 Robinson GMC                         51
## 5    17        17 Robinson GMC                         57
## 6    18        17 Robinson GMC                         64

Now we have a good format, where each dealer is in order and the weeks in which they used the tool are in order. Let’s go on to calculate weekly change.

usage_data <- usage_data |> 
  mutate(weekly_change = weekly_count - lag(weekly_count),
         weekly_change = ifelse(is.na(weekly_change), 0, weekly_change))

head(usage_data)
## # A tibble: 6 × 5
## # Groups:   dealer_id [2]
##    week dealer_id dealer_name                weekly_count weekly_change
##   <dbl>     <int> <chr>                             <int>         <dbl>
## 1    31        15 Miller Hughes Ford Lincoln            1             0
## 2    14        17 Robinson GMC                         25             0
## 3    15        17 Robinson GMC                         44            19
## 4    16        17 Robinson GMC                         51             7
## 5    17        17 Robinson GMC                         57             6
## 6    18        17 Robinson GMC                         64             7

Now, we know the difference in use between each week and each dealer. We can find the average of this to help us determine which dealers are accelerating/decelerating usage.

usage_data_avg <- usage_data |> 
  summarise(average_change = mean(weekly_change))

head(usage_data_avg)
## # A tibble: 6 × 2
##   dealer_id average_change
##       <int>          <dbl>
## 1        15           0   
## 2        17           0.36
## 3        18           0   
## 4        23          47.6 
## 5        70           1.32
## 6        82           0.32

Now we can see the top accelerating/decelerating dealers based off their average product usage.

# add back in dealer_name

# create reference frame
ref <- dealer |> 
  select(dealer_id, dealer_name) |> 
  distinct()

# merge
merged_df <- merge(usage_data_avg, ref, by = "dealer_id")

merged_df$dealer_name <- strtrim(merged_df$dealer_name, width = 35)

# gather top and bottom 5 (changed to 10)
top_10 <- merged_df |> 
  arrange(desc(average_change)) |> 
  head(10)

bottom_10 <- merged_df |> 
  arrange(average_change) |> 
  head(10)

head(top_10)
##   dealer_id average_change                         dealer_name
## 1     32233      123.50000                         AE of Miami
## 2     32158      122.16667             Elco Chevrolet Cadillac
## 3       153       84.36000                          John Wolfe
## 4     10380       83.72000                 Cameron Motorsports
## 5     22292       73.36000                    Carriage Traders
## 6     32238       66.33333 Crown Motors Ltd. - Buick Cadillac
head(bottom_10)
##   dealer_id average_change                  dealer_name
## 1       955      -37.33333           Demo Dealer Portal
## 2     28998      -34.60000 Germain Honda of BeaverCreek
## 3     30564      -32.50000               Volvo of Lisle
## 4     26562      -30.50000           Test Dealership FL
## 5     29107      -26.08696 Germain Buying Center Dayton
## 6      6341      -24.75000             Bo Beuckman Ford

Visualize

acceleration_plot <- top_10 |> 
  ggplot(aes(x = reorder(dealer_name, -average_change),
             y = average_change,
             fill = average_change)) +
  geom_bar(stat = "identity") +
  scale_fill_gradient(low = 'blue', high = 'purple') +
  labs(x = 'Dealer', y = 'Average Weekly Change',
       title = 'Top 10 Accelerating Dealers') +
  theme_minimal() +
  theme(axis.text.y = element_text(angle = 0, vjust = 0.5, hjust = 1),
        axis.text.x = element_text(angle = 40, vjust = 0.5, size = 7))

acceleration_plot

deceleration_plot <- bottom_10 |> 
  ggplot(aes(x = reorder(dealer_name, -average_change),
             y = average_change,
             fill = average_change)) +
  geom_bar(stat = "identity", position = "identity") +
  scale_fill_gradient(low = 'blue', high = 'purple') +
  labs(x = 'Dealer', y = 'Average Weekly Change',
       title = 'Top 10 Decelerating Dealers') +
  theme_minimal() +
  theme(axis.text.y = element_text(angle = 0, vjust = 0.5, hjust = 1),
        axis.text.x = element_text(angle = 40, vjust = 0.5, size = 7))

deceleration_plot 

These two plots portray the top accelerating/decelerating dealers. In Top 10 Accelerating Dealers, we can see that AE of Miami increased their product usage by an average of 123 times each week. In Top 10 Decelerating Dealers, we can see that Demo Dealer Portal decreased their average usage by 37 times each week. This doesn’t seem to be a real dealer so Germain Honda of Beaver Creek was probably the top decelerator with an average decrease of 34.5 less inputs each week. Top decelerators are important to notice because it could lead to a loss of customer.

Weekly Breakdown

We can track the usage across all weeks for the top accelerators/decelerators. This will help us gauge which week was the most influential and how long they actually used the product for. These visuals are interactive and have some features like comparing stats if you click the two stacked bars.

Accelerators

Since a line plot will get too confusing, we will narrow down to top 5 accelerators and decelerators.

top_accelerating_dealers <- top_10[1:5, ] |> 
  select(dealer_id, average_change) |> 
  merge(usage_data, by = "dealer_id") |> 
  arrange(dealer_id, week)
acceleration_plot <- plot_ly(top_accelerating_dealers,
                x = ~week, y = ~weekly_change,
                color = ~factor(dealer_id),
                text = ~paste(dealer_name, "<br>Average Change:", round(average_change, 2))) |> 
  add_lines() |> 
  layout(
    title = "Weekly Change for Top 5 Accelerators",
    xaxis = list(title = "Week"),
    yaxis = list(title = "Weekly Change"),
    showlegend = FALSE,
    annotations = list(
      text = "Notice: This is an interactive plot. There are<br> some shorter lines at weeks 33 and 35.<br> You can zoom in on these.",
      x = 0.85,  # Adjust the x-coordinate for caption placement
      y = 0.1,  # Adjust the y-coordinate for caption placement
      xref = "paper",
      yref = "paper",
      showarrow = FALSE
    )
  )

acceleration_plot

Some of our top accelerators overall have just began using our tool. Others have been around for the whole time period.

Decelerators

top_decelerating_dealers <- bottom_10[1:5, ] |> 
  select(dealer_id, average_change) |> 
  merge(usage_data, by = "dealer_id") |> 
  arrange(dealer_id, week)
deceleration_plot <- plot_ly(top_decelerating_dealers,
                x = ~week, y = ~weekly_change,
                color = ~factor(dealer_id),
                text = ~paste(dealer_name, "<br>Average Change:", round(average_change, 2))) |> 
  add_lines() |> 
  layout(
    title = "Weekly Change for Top 5 Decelerators",
    xaxis = list(title = "Week"),
    yaxis = list(title = "Weekly Change"),
    showlegend = FALSE,
    annotations = list(
      text = "Dealers who fall off the chart early<br>have already cancelled. Germain is our most<br>likely client to cancel due to a decreasing trend of use.",
      x = 0.8, 
      y = 0.1,  
      xref = "paper",
      yref = "paper",
      showarrow = FALSE
    )
  )

deceleration_plot

End Note

It is important to track which dealers are accelerating/decelerating so we can catch them before they cancel. This project could be integrated with current data to see which dealers are increasing every week. We can even go further with more data to see why they might be using the tool less each week.